AITopics | sample compression

Collaborating Authors

sample compression

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Private Distribution Learning with Public Data: The View from Sample Compression

Neural Information Processing SystemsDec-24-2025, 00:54:05 GMT

We study the problem of private distribution learning with access to public data. In this setup, which we refer to as *public-private learning*, the learner is given public and private samples drawn from an unknown distribution $p$ belonging to a class $\mathcal Q$, with the goal of outputting an estimate of $p$ while adhering to privacy constraints (here, pure differential privacy) only with respect to the private samples. We show that the public-private learnability of a class $\mathcal Q$ is connected to the existence of a sample compression scheme for $\mathcal Q$, as well as to an intermediate notion we refer to as \emph{list learning}. Leveraging this connection: (1) approximately recovers previous results on Gaussians over $\mathbb R^d$; and (2) leads to new ones, including sample complexity upper bounds for arbitrary $k$-mixtures of Gaussians over $\mathbb R^d$, results for agnostic and distribution-shift resistant learners, as well as closure properties for public-private learnability under taking mixtures and products of distributions. Finally, via the connection to list learning, we show that for Gaussians in $\mathbb R^d$, at least $d$ public samples are necessary for private learnability, which is close to the known upper bound of $d+1$ public samples.

name change, private distribution learning, public data, (9 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.62)

Add feedback

Export Reviews, Discussions, Author Feedback and Meta-Reviews

Neural Information Processing SystemsOct-2-2025, 20:36:45 GMT

First provide a summary of the paper, and then address the following criteria: Quality, clarity, originality and significance. The paper studies the problem of finding a small subset S* of the a set S of labelled points such that the 1-Nearest Neighbour classifier is consistent with S. The motivation is to speed-up 1NN classification of new points. The problem of finding a minimal set S* is known to be NP-hard, so the paper is concerned with approximations. Apparently all the previous results on the problems concerned heuristics. The present papers presents an algorithm whose approximation is shown to be optimal, in the sense that doing significantly better is NP-hard.

algorithm, classification, compression, (11 more...)

Neural Information Processing Systems

Country: North America > Canada > Quebec > Montreal (0.05)

Genre: Research Report (0.49)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Sample Compression for Continual Learning

Comeau, Jacob, Bazinet, Mathieu, Germain, Pascal, Subakan, Cem

arXiv.org Artificial IntelligenceMar-13-2025

Continual learning algorithms aim to learn from a sequence of tasks, making the training distribution non-stationary. The majority of existing continual learning approaches in the literature rely on heuristics and do not provide learning guarantees for the continual learning setup. In this paper, we present a new method called 'Continual Pick-to-Learn' (CoP2L), which is able to retain the most representative samples for each task in an efficient way. The algorithm is adapted from the Pick-to-Learn algorithm, rooted in the sample compression theory. This allows us to provide high-confidence upper bounds on the generalization loss of the learned predictors, numerically computable after every update of the learned model. We also empirically show on several standard continual learning benchmarks that our algorithm is able to outperform standard experience replay, significantly mitigating catastrophic forgetting.

compression, cop2l, sample compression, (11 more...)

arXiv.org Artificial Intelligence

2503.10503

Country:

North America > United States > Hawaii > Honolulu County > Honolulu (0.04)
Oceania > Australia > New South Wales > Sydney (0.04)
North America > United States > Utah > Salt Lake County > Salt Lake City (0.04)
(8 more...)

Genre: Research Report (1.00)

Industry: Education (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Near-optimal sample compression for nearest neighbors

Lee-Ad Gottlieb, Aryeh Kontorovich, Pinhas Nisnevitch

Neural Information Processing SystemsFeb-12-2025, 00:51:43 GMT

We present the first sample compression algorithm for nearest neighbors with nontrivial performance guarantees. We complement these guarantees by demonstrating almost matching hardness lower bounds, which show that our bound is nearly optimal. Our result yields new insight into margin-based nearest neighbor classification in metric spaces and allows us to significantly sharpen and simplify existing bounds. Some encouraging empirical results are also presented.

algorithm, artificial intelligence, machine learning, (16 more...)

Neural Information Processing Systems

Country: Asia > Middle East > Israel > Southern District > Beer-Sheva (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Case-Based Reasoning (0.87)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.47)

Add feedback

Sample Compression Hypernetworks: From Generalization Bounds to Meta-Learning

Leblanc, Benjamin, Bazinet, Mathieu, D'Amours, Nathaniel, Drouin, Alexandre, Germain, Pascal

arXiv.org Artificial IntelligenceOct-17-2024

Reconstruction functions are pivotal in sample compression theory, a framework for deriving tight generalization bounds. From a small sample of the training set (the compression set) and an optional stream of information (the message), they recover a predictor previously learned from the whole training set. While usually fixed, we propose to learn reconstruction functions. To facilitate the optimization and increase the expressiveness of the message, we derive a new sample compression generalization bound for real-valued messages. From this theoretical analysis, we then present a new hypernetwork architecture that outputs predictors with tight generalization guarantees when trained using an original meta-learning framework. The results of promising preliminary experiments are then reported.

artificial intelligence, compression, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2410.13577

Country:

North America > United States > California > Santa Cruz County > Santa Cruz (0.14)
North America > Canada (0.04)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
(8 more...)

Genre: Research Report (0.64)

Industry: Education (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Private Distribution Learning with Public Data: The View from Sample Compression

Neural Information Processing SystemsOct-10-2024, 00:46:56 GMT

We study the problem of private distribution learning with access to public data. In this setup, which we refer to as *public-private learning*, the learner is given public and private samples drawn from an unknown distribution p belonging to a class \mathcal Q, with the goal of outputting an estimate of p while adhering to privacy constraints (here, pure differential privacy) only with respect to the private samples. We show that the public-private learnability of a class \mathcal Q is connected to the existence of a sample compression scheme for \mathcal Q, as well as to an intermediate notion we refer to as \emph{list learning}. Leveraging this connection: (1) approximately recovers previous results on Gaussians over \mathbb R d; and (2) leads to new ones, including sample complexity upper bounds for arbitrary k -mixtures of Gaussians over \mathbb R d, results for agnostic and distribution-shift resistant learners, as well as closure properties for public-private learnability under taking mixtures and products of distributions. Finally, via the connection to list learning, we show that for Gaussians in \mathbb R d, at least d public samples are necessary for private learnability, which is close to the known upper bound of d 1 public samples.

private distribution learning, public data, sample compression, (5 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.66)

Add feedback

Nearest-Neighbor Sample Compression: Efficiency, Consistency, Infinite Dimensions

Aryeh Kontorovich, Sivan Sabato, Roi Weiss

Neural Information Processing SystemsOct-3-2024, 22:50:32 GMT

We examine the Bayes-consistency of a recently proposed 1-nearest-neighbor-based multiclass learning algorithm. This algorithm is derived from sample compression bounds and enjoys the statistical advantages of tight, fully empirical generalization bounds, as well as the algorithmic advantages of a faster runtime and memory savings. We prove that this algorithm is strongly Bayes-consistent in metric spaces with finite doubling dimension -- the first consistency result for an efficient nearest-neighbor sample compression scheme. Rather surprisingly, we discover that this algorithm continues to be Bayes-consistent even in a certain infinitedimensional setting, in which the basic measure-theoretic conditions on which classic consistency proofs hinge are violated. This is all the more surprising, since it is known that k-NN is not Bayes-consistent in this setting. We pose several challenging open problems for future research.

algorithm, bayes-consistent, metric space, (16 more...)

Neural Information Processing Systems

Country:

Asia > Middle East > Israel (0.04)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > California > Los Angeles County > Long Beach (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Case-Based Reasoning (0.84)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.46)

Add feedback

Near-optimal sample compression for nearest neighbors

Neural Information Processing SystemsMar-13-2024, 10:37:35 GMT

algorithm, nearest neighbor, neighbor, (14 more...)

Neural Information Processing Systems

Country: Asia > Middle East > Israel > Southern District > Beer-Sheva (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Case-Based Reasoning (0.87)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.47)

Add feedback

Bounded Memory Active Learning through Enriched Queries

Hopkins, Max, Kane, Daniel, Lovett, Shachar, Moshkovitz, Michal

arXiv.org Machine LearningFeb-9-2021

The explosive growth of easily-accessible unlabeled data has lead to growing interest in active learning, a paradigm in which data-hungry learning algorithms adaptively select informative examples in order to lower prohibitively expensive labeling costs. Unfortunately, in standard worst-case models of learning, the active setting often provides no improvement over non-adaptive algorithms. To combat this, a series of recent works have considered a model in which the learner may ask enriched queries beyond labels. While such models have seen success in drastically lowering label costs, they tend to come at the expense of requiring large amounts of memory. In this work, we study what families of classifiers can be learned in bounded memory. To this end, we introduce a novel streaming-variant of enriched-query active learning along with a natural combinatorial parameter called lossless sample compression that is sufficient for learning not only with bounded memory, but in a query-optimal and computationally efficient manner as well. Finally, we give three fundamental examples of classifier families with small, easy to compute lossless compression schemes when given access to basic enriched queries: axis-aligned rectangles, decision trees, and halfspaces in two dimensions.

algorithm, decision tree, query, (15 more...)

arXiv.org Machine Learning

2102.05047

Country:

North America > United States > Wisconsin > Dane County > Madison (0.04)
North America > United States > California > Santa Cruz County > Santa Cruz (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Add feedback